Addressing biased occurrence data in predicting potential Sierra Nevada red fox habitat for survey prioritization

نویسندگان

  • Casey Cleve
  • Ellen Hines
چکیده

The Sierra Nevada red fox Vulpes vulpes necator is listed as a threatened species under the California Endangered Species Act. It originally occurred throughout California’s Cascade and Sierra Nevada mountain regions. Its current distribution is unknown but should be determined in order to guide management actions. We used occurrence data from the only known population, in the Lassen Peak region of northern California, combined with climatic and remotely sensed variables, to predict the species’ potential distribution throughout its historic range. These model predictions can guide future surveys to locate additional fox populations. Moreover, they allow us to compare the relative performances of presence-absence (logistic regression) and presence-only (maximum entropy, or Maxent) modeling approaches using occurrence data with potential false absences and geographical biases. We also evaluated the recently revised Maxent algorithm that reduces the effect of geographically biased occurrence data by subsetting background pixels to match biases in the occurrence data. Within the Lassen Peak region, all models had good fit to the test data, with high values for the true skill statistic (76–83%), percent correctly classified (86–92%), and area under the curve (0.94–0.96), with Maxent models yielding slightly higher values. Outside the Lassen Peak region, the logistic regression model yielded the highest predictive performance, providing the closest match to the fox’s historic range and also predicting a site where red foxes were subsequently detected in autumn 2010. Subsetting background pixels in Maxent reduced but did not eliminate the effect that geographically biased occurrence data had on prediction results relative to the Maxent model using full background pixels. densities throughout the high elevations of the Sierra Nevada and southern Cascade mountain ranges of California and Oregon, USA (Grinnell et al. 1937, Hall 1981, Sacks et al. 2010). Within this broad range, Grinnell et al. (1937) reported 3 population centers: the Mount Shasta/Lassen Peak region in northern California, the central Sierra near Mono Lake and Yosemite National Park, and the southern Sierra near Mount Whitney (now mostly in the Sequoia and Kings Canyon National Parks). In 1980, due to a noticeable decline in numbers, the Sierra Nevada red fox was listed as a state ‘threatened’ species; the factors causing its decline are unknown (CDFG 1996, 2004). Despite its former extent, the verified detections since 1991 have all been in the Lassen Peak region of northern California (Perrine et al. 2007). A recent conservation assessment for this species (Perrine et al. 2010) recommended that targeted surveys for the Sierra Nevada red fox be conducted throughout its historic range to determine whether any additional populations exist. Prior attempts to detect this species as part of multispecies carnivore inventories have been unsuccessful, even in the Lassen Peak region where the species has persisted (Zielinski et al. 2005). Surveys targeting the Sierra Nevada red fox and focusing on areas with high probabilities of the species’ occurrence may increase detection probability and survey efficiency. Our goal was to predict the extent and distribution of suitable habitat for the Sierra Nevada red fox throughout its historic range, based on the characteristics of occupied habitat in its current known range. The available data were collected during a comprehensive ecological study of the Lassen Peak population (Perrine 2005). Unfortunately, these detection data contained 2 important biases. First, the survey data contained potential ‘false absences;’ surveys have failed to detect this species in areas where populations were known to be present. Absence data for rare species may result from the rarity of the species rather than its true absence (MacKenzie et al. 2006). False absences are problematic in species distribution modeling because they do not indicate unsuitable habitat or confirm that the species does not occur at a given site (Guisan & Zimmermann 2000, Engler et al. 2004, MacKenzie et al. 2006). Second, the only available occurrence data for the Sierra Nevada red fox were from the Lassen Peak region, which represents only a small portion of the species’ historic range. Researchers have cautioned against using geographically biased occurrence data or transferring models to broad unsampled regions (Peterson et al. 2007, Barbosa et al. 2009); however, such projections have successfully increased detection rates (Guisan et al. 2006). General linear models (GLMs) may transfer quite well to unsampled areas (Randin et al. 2006, Barbosa et al. 2009); presence-only models such as Maxent have made improvements in their transferability (Phillips 2008), with their performance being positively correlated with the similarity between the occurrence data region and the projection area (Bulluck et al. 2006). Addressing these biases provided an opportunity to explicitly compare the performance of 2 different modeling approaches: logistic regression based on presence-absence data versus maximum entropy based on presence-only data. Logistic regression has been widely used in species distribution modeling (Mladenoff et al. 1999, Nielsen et al. 2002, Johnson et al. 2004, Posillico et al. 2004, Olivier & Wotherspoon 2006). Logistic regression uses presence-absence data to model the probability of species occurrence as a function of its predictor variables, which can be continuous or categorical (MacKenzie et al. 2006). Its output is confined to values between 0.0 and 1.0. Maximum entropy methods, although relatively new and not as widely used as logistic regression, can outperform logistic regression (Elith et al. 2006) and have successfully identified locations of previously undiscovered populations (Rebelo & Jones 2010). The most widely used maximum entropy approach, in the program Maxent (Phillips et al. 2006), estimates the species’ probability distribution that is most dispersed within the constraints of the target population information. Like logistic regression, Maxent can use both categorical and continuous predictor variables, and the output can provide information on the relative contribution of each predictor variable (Phillips et al. 2006). Maxent can utilize presence-only data, sidestepping the problem of false absences, but geographical and environmental biases in the occurrence data can introduce considerable error in presence-only models (Phillips 2008). Presence-only models such as Maxent draw background pixel values from the entire study region, while presence data values are drawn only from a small portion of the study area (Phillips et al. 2006). The resulting predictions may therefore underrepresent habitat suitability outside of the occurrence data area (Peterson et al. 2007, Phillips 2008). Substantial improvements can be made in Maxent models derived from biased occurrence data by selecting background data with similar biases as the occurrence data (Phillips 2008). This approach has recently been implemented in Maxent (Phillips 2008), but its effects have had little empirical validation. Here we compared the performance and output of a presence-absence logistic regression model versus presence-only Maxent models with and without the transferability improvements. We then combined the output from the logistic regression and updated Maxent model to generate an ensemble prediction, leveraging the strengths of each model while minimizing their respective weaknesses (Araújo & New 2007, Stohlgren et al. 2010). In addition, we explored the use of unclassified spectral data as a predictor variable in place of predetermined classification schemes (e.g. vegetation or canopy cover categories). Although classified maps are commonly used predictor variables, wildlife may respond to continuous environmental gradients that are not captured in the class ification schemes (Laurent et al. 2005). By using unclassified spectral data, species’ occurrence can be predicted by spectrally detectable components of their habitat, rather than predetermined classification schemes that may inaccurately delineate boundaries between cover types and under-represent habitat heterogeneity (St-Louis et al. 2006). Using unclassified spectral reflectance in the distribution model may minimize errors in the resulting predictive maps (Laurent et al. 2005). MATERIALS AND METHODS Study area. Our model prediction area covers the area within and immediately surrounding the historic range of the Sierra Nevada red fox (Fig. 1). This in cludes the Sierra Nevada and the southernmost extent of the Cascade Range in California. The prediction area spans 2 Major Land Resource Area (MLRA) ecoregions: the Sierra Nevada ecoregion and the California portion of the Eastern Cascade Slopes and Foot hills ecoregion (USDA-NRCS 2006). The Sierra Nevada ecoregion extends from just south of Lassen Peak to the Tehachapi Pass near Bakersfield. The majority of this ecoregion is comprised of elevations ranging from 450 to 2750 m, with the highest peak being Mount Whitney (4419 m). The California portion of the Eastern Cascade Slopes and Foothills ecoregion represents the southernmost extent of the Cascade Mountain Range, extending from the Central Cascade Mountains to the Sierra Nevada. The majority of this ecoregion is comprised of elevations ranging from 450 to 2500 m, with the highest peak being Mount Shasta (4318 m). Sierra Nevada red fox survey data were collected from the area surrounding Lassen Peak (3187 m), the southernmost peak in the Cascade Range. The Lassen Peak region (6455 km2) includes Lassen Volcanic National Park (LVNP), the surrounding Lassen National Forest (LNF), and the immediately adjacent lands of various ownerships (Fig. 1). This montane area is dominated by conifers such as Jeffrey pine Pinus jeffreyi and Ponderosa pine P. ponderosa, red fir Abies magnifica and white fir A. concolor, and mountain hemlock Tsuga mertensiana, along with wet alpine meadows and talus slopes. This area has a Mediterranean climate with warm dry summers and cold wet winters. Most of the annual precipitation occurs as snow from November through April, with snowpacks at the higher elevations often exceeding 3 m in depth and persisting into the summer months. Fox survey data. Sierra Nevada red fox locations were determined using 4 detection methods: radio telemetry, scats (feces), and camera surveys using opportunistic and stratified random sampling designs. Each detection method contained biases. Fig. 1. Vulpes vulpes necator. Model prediction area for the Sierra Nevada red fox relative to its historic range (Grinnell et al. 1937) and the Lassen Peak region. Note that scale bar is in miles (1 mile = ca. 1.6 km) Five Sierra Nevada red foxes were captured and fitted with VHF collars and tracked by aerial and ground-based telemetry from 1998 through 2002 (Perrine 2005). All capture and handling activities were in accordance with California Department of Fish and Game and University of California Berkeley protocols. The field team collected 586 independent ground telemetry locations using a Trimble GeoExplorer II GPS and LOCATE II telemetry software (Nams 2001), and aerial telemetry provided 123 additional locations. In addition, a total of 227 Sierra Nevada red fox scats were collected opportunistically from June 1998 through December 2002, primarily in association with ground telemetry. The telemetry and scat locations were clustered in the western half of LVNP and the adjacent LNF lands (Perrine 2005). The opportunistic camera station survey was conducted between 1992 and 2002 and consisted of 968 baited TrailMaster (Goodson and Associates) camera stations throughout LVNP and the LNF (Perrine 2005). This survey was conducted primarily by Park and Forest Service biologists following the standard protocol for surveying forest carnivores (Zielinski & Kucera 1995). Although the cameras were widely distributed throughout the region, sampling biases arose due to the opportunistic nature of this survey. For example, the southwest portion of LVNP and the LNF east of the Caribou Wilderness were heavily sampled, whereas the northern portion of the region had the least sampling effort. Samples were also biased toward roads. This survey yielded 50 Sierra Nevada red fox detection locations, with multiple detections at some locations. The stratified random camera survey was conducted in the summers of 2001 and 2002. This survey consisted of 24 sites stratified by elevation and randomly placed throughout the Lassen Peak region. Each site contained 2 baited TrailMaster cameras approximately 1.6 km apart, following the standard protocol for forest carnivores (Zielinski & Kucera 1995). This survey yielded 3 red fox detection locations (Perrine 2005). We combined and subsampled these occurrence data to use as the species response variable. Combining these data reduced the effect of the sampling biases inherent in each method, but new errors arose as a result of combining 4 different collection methods that spanned multiple years. Some locations contained 2 presence points, because a fox was detected in the same area by 2 different methods. These pseudoreplicates violate the assumption that training data points are independent, which in turn can bias model results (Guisan & Zimmermann 2000). Additionally, false absences likely occurred at some camera locations due to the elusiveness of this species. Within the 10 yr sampling period, several camera locations detected foxes in one year but not another. Having detection and non-detection records at the same location can introduce conflicting information in the model, which may lower its predictive power. To remove pseudoreplicates and correct for conflicting information, we used ArcGIS (ESRI) to label a location as a presence if a fox was detected there at any time during the 10 yr period. Similarly, we deleted duplicate de tections at the same location. After pooling these data, the sampling intensity varied between habitat types. Since survey data were plentiful (~2000 records), we randomly subsampled the data from each environmental zone to balance the sampling intensity (Guisan & Zimmermann 2000, Araújo & Guisan 2006). The environmental zones were based on a combination of 10 elevation zones and 19 California Wildlife Habitat Relationship (CWHR; Mayer & Laudenslayer 1988) types. This subsampling reduced the dataset from 2000 to 1200 points (600 per presence and absence, respectively) and reduced but did not eliminate sample clustering. To reduce clustering bias, we used Thiessen polygons (Rhynsburger 1973) to downweight points that occurred close together. Environmental predictor variables. We based our environmental predictor variables on Sierra Nevada red fox ecology and the availability of digital data. The Sierra Nevada red fox is associated with high-elevation conifer forests, subalpine woodlands, talus slopes, and barren areas above treeline (Grinnell et al. 1937, Schempf & White 1977, Perrine 2005). To represent these environmental conditions, we used a variety of GIS layers containing vegetation, climate, hydrology, and forest structure data. Specifically, to represent vegetation and forest stand structure, we used CalFire Fire Resource and Assessment Program’s Multi-Source Land Cover Data (MSLC), which contained CWHR vegetation type, total tree canopy closure, tree size class, and tree density class attributes (www.frap.cdf. ca.gov). We also derived Tasseled-cap greenness and wetness from Landsat 5 imagery as an additional variable (software: ERDAS 2008 Leica geosystems geospatial imaging, Atlanta, GA). Tasseled-cap transformation variables represent a continuous environmental gradient that is highly correlated with stand age and structural complexity (Hansen et al. 2001). Pixels containing high greenness and wetness values are associated with dense vege tation having high leaf area index, while lower values indicate sparsely vegetated areas such as barren areas or regions of snow and ice (White et al. 1997, Waring & Running 1998). We used Spatial Analyst (ESRI) to calculate the Euclidean (straight-line) distance from the center of each raster cell to the nearest water feature from the National Hydrography dataset (http://nhd.usgs. gov) and to derive slope from a 30 m digital elevation model (USGS 2000). The Sierra Nevada red fox’s elevational limits, relationship to snow pack, and phylogeography (Aubry et al. 2009) suggest an affinity for specific climatic conditions. We used gridded climate data derived from the Parameter-elevation Regressions on Independent Slopes Model (PRISM) to predict the species’ response to different climatic conditions; specifically, we used mean monthly precipitation and monthly average daily minimum and maximum temperatures from 1971 through 2000 (Daly et al. 1994). After selecting the initial environmental variables based on red fox ecology, we used the R statistical package (R Development Core Team 2005) to determine correlations between continuous variables and to identify interaction terms. If 2 variables were correlated (Pearson’s correlation coefficient > 0.3), only the variable with the lower p value was retained. We then identified pairwise interaction terms using a classification and regression tree (Miller & Franklin 2002). Classification and regression trees determine a set of if– then statements that define class membership, and can express complex non-linear and non-additive relationships among the predictor variables (Miller & Franklin 2002). We included 9 environmental variables in the classification and regression tree model: CWHR type, total tree canopy closure, tree size class, tree density class, slope, February precipitation, minimum December temperature, Tassled-cap greenness, and distance to water. After removing correlated predictor variables and determining interactions, we selected significant predictor variables using iterative manual stepwise logistic regression; at each run, the least significant variable was removed until only significant variables remained (Hastie & Pregibon 1992, Hosmer & Lemeshow 2000). In addition, at each step the Akaike information criterion (AIC) was used to select the best fitting model. AIC is a standardized score used to compare models for best fit relative to the number of parameters in the model; lower AIC values indicate better fit (Burnham & Anderson 2002). The 9 environmental variables listed in the previous paragraph, along with the 2 pairwise interaction terms identified in the classification and regression tree model (see ‘Results’), were included in the stepwise logistic regression weighted by Thiessen polygon area. The resulting set of significant predictor variables was used in both the Maxent and logistic regression models. This allowed for direct comparison between the 2 modeling approaches. This 2-step method of using a GLM to select predictor variables followed by Maxent modeling has been shown to create predictions with very high area under the receiver operating characteristic (ROC) curve (AUC) values (Wollan et al. 2008). High AUC values indicate low error, while lower values indicate lower predictability (Pearce & Ferrier 2000). Distribution models and model evaluation. We generated 3 distribution models: a presence-only maximum entropy method (Maxent) with full region background pixels (hereafter, MFB), a Maxent model using a subset of background pixels with similar biases as the occurrence data (MSB), and a spatially-weighted presence– absence logistic regression model (LRW). We used the default parameters for both Maxent models and generated outputs in the logistic regression format. For each approach, we developed the model with a random subset of 70% of the data and withheld the remaining 30% for model evaluation. To determine the classification accuracy of each model, we used the evaluation data to identify the optimum cutoff value that corresponded with high red fox habitat suitability. Optimum cutoff values were determined by calculating the true skill statistic (TSS) across the entire range of potential cutoff values, and the cutoff value that corresponded with the highest TSS was selected as the optimum cutoff (Allouche et al. 2006, Jones et al. 2010). We then calculated the AUC to assess predictive performance (Buckland et al. 1997). Predictive accuracy could only be tested in the Lassen Peak region because of the limited geographic extent of the available data. To compare model performance outside the Lassen Peak region, we compared the distribution and abundance of each model’s suitable habitat to the other models and to the historic range map for the Sierra Nevada red fox (Grinnell et al. 1937). We used the optimum cutoff value to determine the appropriate suitable habitat threshold. To create the ensemble prediction, the values of each model that fell below the optimum cutoff value were given a value of 0.0. The mean probability value from both models was then assigned to each cell of the study area.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dataset of Passerine bird communities in a Mediterranean high mountain (Sierra Nevada, Spain)

In this data paper, a dataset of passerine bird communities is described in Sierra Nevada, a Mediterranean high mountain located in southern Spain. The dataset includes occurrence data from bird surveys conducted in four representative ecosystem types of Sierra Nevada from 2008 to 2015. For each visit, bird species numbers as well as distance to the transect line were recorded. A total of 27847...

متن کامل

Using occupancy and population models to assess habitat conservation opportunities for an isolated carnivore population

An isolated population of the fisher (Martes pennanti) in the southern Sierra Nevada, California, is threatened by small size and habitat alteration from wildfires, fuels management, and other factors. We assessed the population’s status and conservation options for its habitat using a spatially explicit population model coupled with a fisher probability of occurrence model. The fisher occurren...

متن کامل

Data from camera surveys identifying co-occurrence and occupancy linkages between fishers (Pekania pennanti), rodent prey, mesocarnivores, and larger predators in mixed-conifer forests.

These data provide additional information relevant to the frequency of fisher detections by camera traps, and single-season occupancy and local persistence of fishers in small patches of forest habitats detailed elsewhere, "Landscape Fuel Reduction, Forest Fire, and Biophysical Linkages to Local Habitat Use and Local Persistence of Fishers (Pekania pennanti) in Sierra Nevada Mixed-conifer Fores...

متن کامل

On the absence of the Green-tailed Trainbearer Lesbia nuna (Trochilidae) from Venezuela: an analysis based on environmental niche modelling

Background Lesbia nuna, a hummingbird distributed in the tropical Andes, has been included in Venezuela's bird list on the basis of a specimen collected in 1873 at Sierra Nevada, Mérida and deposited in the Natural History Museum, London, with no further records for this country since then. This record, largely considered as valid by most authors, has been questioned by others, although without...

متن کامل

Effects of Conifers on Aspen-breeding Bird Communities in the Sierra Nevada

We examined bird-habitat relationships within and across a range of aspen habitats in four major watersheds in the eastern Sierra Nevada mountains of California and Nevada to identify habitat features of importance to aspen-breeding birds. Using point counts and vegetation assessments from 462 individual stations between 2001 and 2003 allowed us to investigate important habitat features at wate...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011